Serveur d'exploration Covid

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes

Identifieur interne : 000B87 ( Main/Exploration ); précédent : 000B86; suivant : 000B88

Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes

Auteurs : Mohammed Sahli [Japon] ; Tetsuo Shibuya [Japon]

Source :

RBID : PMC:3441218

Abstract

Background

Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.

Findings

In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for Influenza Virus A. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous k-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or k-mers’ lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive.

Conclusions

Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.

Arapan-S is available for free to the public. The binary files for Arapan-S are available through http://sourceforge.net/projects/dnascissor/files/.


Url:
DOI: 10.1186/1756-0500-5-243
PubMed: 22591859
PubMed Central: 3441218


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation wicri:level="4">
<nlm:aff id="I1">Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033</wicri:regionArea>
<orgName type="university">Université de Tokyo</orgName>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="province">Région de Kantō</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation wicri:level="4">
<nlm:aff id="I2">Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639</wicri:regionArea>
<orgName type="university">Université de Tokyo</orgName>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="province">Région de Kantō</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22591859</idno>
<idno type="pmc">3441218</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441218</idno>
<idno type="RBID">PMC:3441218</idno>
<idno type="doi">10.1186/1756-0500-5-243</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000395</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000395</idno>
<idno type="wicri:Area/Pmc/Curation">000395</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000395</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000525</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000525</idno>
<idno type="wicri:Area/Ncbi/Merge">000276</idno>
<idno type="wicri:Area/Ncbi/Curation">000276</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000276</idno>
<idno type="wicri:Area/Main/Merge">000B89</idno>
<idno type="wicri:Area/Main/Curation">000B87</idno>
<idno type="wicri:Area/Main/Exploration">000B87</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation wicri:level="4">
<nlm:aff id="I1">Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033</wicri:regionArea>
<orgName type="university">Université de Tokyo</orgName>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="province">Région de Kantō</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation wicri:level="4">
<nlm:aff id="I2">Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639</wicri:regionArea>
<orgName type="university">Université de Tokyo</orgName>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="province">Région de Kantō</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Research Notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.</p>
</sec>
<sec>
<title>Findings</title>
<p>In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for
<italic>Influenza Virus A</italic>
. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous
<italic>k</italic>
-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or
<italic>k</italic>
-mers’ lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.</p>
<p>Arapan-S is available for free to the public. The binary files for Arapan-S are available through
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/dnascissor/files/">http://sourceforge.net/projects/dnascissor/files/</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author>
<name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author>
<name sortKey="Kerlavage, Ar" uniqKey="Kerlavage A">AR Kerlavage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Madan, A" uniqKey="Madan A">A Madan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
<author>
<name sortKey="Yang, Sp" uniqKey="Yang S">SP Yang</name>
</author>
<author>
<name sortKey="Hillier, L" uniqKey="Hillier L">L Hillier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chevreux, B" uniqKey="Chevreux B">B Chevreux</name>
</author>
<author>
<name sortKey="Pfisterer, T" uniqKey="Pfisterer T">T Pfisterer</name>
</author>
<author>
<name sortKey="Drescher, B" uniqKey="Drescher B">B Drescher</name>
</author>
<author>
<name sortKey="Driesel, Aj" uniqKey="Driesel A">AJ Driesel</name>
</author>
<author>
<name sortKey="Muller, Weg" uniqKey="Muller W">WEG Müller</name>
</author>
<author>
<name sortKey="Wetter, T" uniqKey="Wetter T">T Wetter</name>
</author>
<author>
<name sortKey="Suhai, S" uniqKey="Suhai S">S Suhai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warren, Rl" uniqKey="Warren R">RL Warren</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Mcewen, Gk" uniqKey="Mcewen G">GK McEwen</name>
</author>
<author>
<name sortKey="Margulies, Eh" uniqKey="Margulies E">EH Margulies</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I MacCallum</name>
</author>
<author>
<name sortKey="Kleber, M" uniqKey="Kleber M">M Kleber</name>
</author>
<author>
<name sortKey="Shlyakhter, Ia" uniqKey="Shlyakhter I">IA Shlyakhter</name>
</author>
<author>
<name sortKey="Belmonte, Mk" uniqKey="Belmonte M">MK Belmonte</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Przybylski, D" uniqKey="Przybylski D">D Przybylski</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Burton, J" uniqKey="Burton J">J Burton</name>
</author>
<author>
<name sortKey="Shlyakhter, I" uniqKey="Shlyakhter I">I Shlyakhter</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
<author>
<name sortKey="Malek, J" uniqKey="Malek J">J Malek</name>
</author>
<author>
<name sortKey="Mckernan, K" uniqKey="Mckernan K">K McKernan</name>
</author>
<author>
<name sortKey="Ranade, S" uniqKey="Ranade S">S Ranade</name>
</author>
<author>
<name sortKey="Shea, Tp" uniqKey="Shea T">TP Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author>
<name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author>
<name sortKey="Schein, Je" uniqKey="Schein J">JE Schein</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author>
<name sortKey="Shi, Z" uniqKey="Shi Z">Z Shi</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author>
<name sortKey="Shan, G" uniqKey="Shan G">G Shan</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bryant, Dw" uniqKey="Bryant D">DW Bryant</name>
</author>
<author>
<name sortKey="Wong, Wk" uniqKey="Wong W">WK Wong</name>
</author>
<author>
<name sortKey="Mockler, Tc" uniqKey="Mockler T">TC Mockler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sommer, Dd" uniqKey="Sommer D">DD Sommer</name>
</author>
<author>
<name sortKey="Dlecher, Al" uniqKey="Dlecher A">AL Dlecher</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author>
<name sortKey="Brudno, M" uniqKey="Brudno M">M Brudno</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
<region>
<li>Région de Kantō</li>
</region>
<settlement>
<li>Tokyo</li>
</settlement>
<orgName>
<li>Université de Tokyo</li>
</orgName>
</list>
<tree>
<country name="Japon">
<region name="Région de Kantō">
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
</region>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/CovidV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B87 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000B87 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    CovidV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:3441218
   |texte=   Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:22591859" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a CovidV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Fri Mar 27 18:14:15 2020. Site generation: Sun Jan 31 15:15:08 2021